Codeswitching Detection via Lexical Features in Conditional Random Fields
نویسنده
چکیده
Half of the world’s population is estimated to be at least bilingual. Due to this fact many people use multiple languages interchangeably for effective communication. At the Second Workshop on Computational Approaches to Code Switching, we are presented with a task to label codeswitched, Spanish-English (ES-EN) and Modern Standard Arabic-Dialect Arabic (MSA-DA), tweets. We built a Conditional Random Field (CRF) using wellrounded features to capture not only the two languages but also the other classes. On the Spanish-English(ES-EN) classification task, we obtained weighted F1-score of 0.88 on the tweet level and an accuracy of 96.5% on the token level. On the MSA-DA classification task, our system managed to obtain F1-score of 0.66 on tweet level and overall token level accuracy of 74.7%.
منابع مشابه
Detection of Agreement and Disagreement in Broadcast Conversations
We present Conditional Random Fields based approaches for detecting agreement/disagreement between speakers in English broadcast conversation shows. We develop annotation approaches for a variety of linguistic phenomena. Various lexical, structural, durational, and prosodic features are explored. We compare the performance when using features extracted from automatically generated annotations a...
متن کاملIdentifying Agreement/Disagreement in Conversational Speech: A Cross-Lingual Study
This paper presents models for detecting agreement/disagreement between speakers in English and Arabic broadcast conversation shows. We explore a variety of features, including lexical, structural, durational, and prosodic features. We experiment with these features using Conditional Random Fields models and conduct systematic investigations on efficacy of various feature groups across language...
متن کاملAutomatic Prosodic Labeling with Conditional Random Fields and Rich Acoustic Features
Many acoustic approaches to prosodic labeling in English have employed only local classifiers, although text-based classification has employed some sequential models. In this paper we employ linear chain and factorial conditional random fields (CRFs) in conjunction with rich, contextually-based prosodic features, to exploit sequential dependencies and to facilitate integration with lexical feat...
متن کاملBroadcast News Story Segmentation Using Conditional Random Fields and Multimodal Features
This paper proposes to integrate multi-modal features using conditional random fields (CRF) for broadcast news story segmentation. We study story boundary cues from lexical, audio and video modalities, where lexical features consist of lexical similarity, chain strength and overall cohesiveness, acoustic features involve pause duration, pitch, speaker change and audio event type, and visual fea...
متن کاملAutomatic Prosodic Labeling with Conditional Random Fields and Rich Acoustic Features
Many acoustic approaches to prosodic labeling in English have employed only local classifiers, although text-based classification has employed some sequential models. In this paper we employ linear chain and factorial conditional random fields (CRFs) in conjunction with rich, contextually-based prosodic features, to exploit sequential dependencies and to facilitate integration with lexical feat...
متن کامل